There are several packages and options available to construct and analyze ecological networks. Here, we will use the bipartite package (Dormann et al. 2008) to plot a basic site by species network and then apply Beckett’s (2016) DIRTLPAb+ algorithm to perform community detection by bipartite modularity analysis. Once the communities are found, we will work through some visualization options. For this demonstration, we will use phytoplankton community data from Alberta, Canada.
(from Loewen et al. 2021a. Bioregions are predominately climatic for fishes of northern lakes. Global Ecology and Biogeography)
Modularity analysis provides unsupervised learning of community structure based solely on network topology, where divisions are made to maximize the number of edges within, rather than between, resultant modules (Newman & Girvan 2004). While there are several methods available for clustering data, most distance-based procedures flatten bipartite networks, reducing species information to some measure of site dissimilarity. Modularity analysis can be used to maintain the bipartite nature of site-species networks, identifying both groups of sites that share distinct biotas and sets of species that tend to co-occur.
While modularity analysis was initially developed for the unipartite case (Newman & Girvan 2004), a bipartite formulation was proposed by Barber (2007) to allow links only between nodes of opposing types. Barber’s definition classifies both sets of nodes simultaneously and produces one-to-one node correspondence (i.e. it will produce the same number of modules for sites and species), associating assemblages directly to their physical locations.
Modularity optimization is computationally challenging (NP-hard; Miyauchi & Sukegawa 2015), necessitating heuristics to search through solution space of larger networks. The DIRTLPAb+ algorithm of Beckett (2016) is a label propagation approach to maximize Barber’s bipartite modularity. Briefly, the label propagation algorithm (LPA) for community detection was proposed by Raghavan et al. (2007), where nodes were initialized with all unique labels, updated iteratively to adopt the most common labels of their neighbours (with ties broken uniformly), and grouped together once no further improvements could be made. The routine was subsequently modified to the bipartite case (LPAb); however, it had a tendency to become trapped in local maxima. To escape such traps, Liu & Murata (2010) incorporated a multistep greedy agglomerative algorithm (LPAb+). Their approach applied asynchronous label propagation (updating node labels of each type in turn) but added a step where resulting groups could be merged if doing so improved the solution, iterating between propagation and aggregation phases. Finally, because label propagation is inherently stochastic, and thus outcomes can vary depending on initialization, Beckett (2016) proposed that LPAb+ should be repeated under different node configurations (i.e. numbers of unique starting labels) and report the greatest ensuing score (DIRTLPAb+).
For this tutorial, we will use R software (https://www.r-project.org/). We start by opening R, setting the working directory, and loading the bipartite and dplyr packages.
# Load packages
library(bipartite)
library(dplyr)
Next we read our ecological community data matrix containing species (columns) and sites (rows). The matrix can be binary (0s and 1s) or weighted (e.g. by abundance or biomass). For this demonstration, we will use data on phytoplankton communities in lakes and reservoirs across Alberta, Canada (see Loewen et al. 2020 for details). These data are presented as biomass for individual taxa at lakes and reservoirs on different sampling dates, and are available for download from the Dryad Digital Repository (https://doi.org/10.5061/dryad.gf1vhhmk7).
Download the data into your working directory and load it into the R environment. As data from multiple sampling events are provided fore each site, we will subset the matrix to work with records obtained in August only and remove extra fields.
# Read the species data
phyto.comm <- read.csv("Loewen_et_al._2020._Ecol_App._Species_data_-_final.csv", header = T)
# Subset data to select August sampling events
phyto.comm.aug.bio <- subset(phyto.comm, Month == "Aug")
# Set Lake.Name as row names
rownames(phyto.comm.aug.bio) <- phyto.comm.aug.bio$Lake.Name
# Remove extra fields
phyto.comm.aug.bio <- phyto.comm.aug.bio[, 5:308]
# Remove any species not found in August
i <- (colSums(phyto.comm.aug.bio) != 0)
phyto.comm.aug.bio <- phyto.comm.aug.bio[, i]
# Create binary community matrix (presence/absence) from weighted matrix
phyto.comm.aug.bin <- phyto.comm.aug.bio
phyto.comm.aug.bin[phyto.comm.aug.bin > 0] <- 1
Now that we have our data wrangled, we can visualize them as bipartite networks. Lets plot the unweighted bipartite graph first.
# Default bipartite graph plot
plotweb(phyto.comm.aug.bin, text.rot = 90)
Now plot a bipartite graph weighted by biomass.
# Default bipartite graph plot
plotweb(phyto.comm.aug.bio, text.rot = 90)
While there are too many species to make out the details in these plots, nodes at the top represent individual taxa while sites are represented along the bottom. Note that in the second (weighted) plot, thicker links between nodes indicate relatively greater biomass for a species at a particular site, whereas all links are the same thickness in the first (unweighted, presence/absence) plot. Node thickness also varies, indicating relative biomass in the weighted graph and relative occurrence in the unweighted graph.
While different approaches may reveal different patterns, we will use the computeModules function in the bipartite package to find groups of densely linked sites and species (i.e. modules) by maximizing Barber’s index. First, we will analyze the unweighted network.
# Set random seed for reproducibility of random process
set.seed(99)
# Conduct bipartite modularity analysis on binary network using Beckett's algorithm
bin.DIRT <- computeModules(phyto.comm.aug.bin, method = "Beckett")
# Determine number of modules
nrow(bin.DIRT@modules) - 1
## [1] 6
# Calculate participation coefficient
bin.DIRT.cz.higher <- czvalues(bin.DIRT, level = "higher")
bin.DIRT.cz.lower <- czvalues(bin.DIRT, level = "lower")
# Check modularity (Q) value of proposed module structure
bin.DIRT@likelihood
## [1] 0.2477575
The algorithm detected six modules with a modularity value of 0.2477575, but are these modules truly non-random?
For this, we can compare the computed value to those obtained from a series of random networks. The issue of obtaining random networks is also non-trivial, and several approaches exist. Here, we will use the sequentially swapping curveball algorithm (Strona et al. 2014), which has been shown to efficiently produce uniformly distributed null matrices that maintain row (site) and column (species) totals for presence/absence data. Because the approach is sequential (i.e. further randomized with each step), it requires a large number of iterations as well as a burn-in period. We will use the oecosimu function in the vegan package (Oksanen et al. 2020) to run the simulations in parallel. We will use 5,000 burn-in and thinning steps, which discard iterations at the beginning and between each sample, respectively.
# Load packages
library(parallel)
library(vegan)
# Create a function that captures the modularity values of module structures
func <- function(x) computeModules(x)@likelihood
# Set random seed for reproducibility
set.seed(99)
# Detect and set the number of available cores to run simulations in parallel
mc.cores <- parallel::detectCores()
clus <- makeCluster(mc.cores)
clusterEvalQ(clus, library(bipartite))
# Calculate module structure for 20 null weighted networks, obtained using the curveball algorithm, and conduct one-sided randomization test
bin.test <- oecosimu(comm = phyto.comm.aug.bin, nestfun = func, method = "curveball", nsimul = 20, burnin = 5000, thin = 5000, statistic = "likelihood", alternative = "greater", parallel = clus)
# Stop the cluster
stopCluster(clus)
# Extract P-value
bin.test[["oecosimu"]][["pval"]]
## [1] 0.04761905
# Plot observed and simulated modularity values
plot(density(bin.test[["oecosimu"]][["simulated"]]), xlim = c(min(bin.test[["statistic"]], min(bin.test[["oecosimu"]][["simulated"]])), max(bin.test[["statistic"]], max(bin.test[["oecosimu"]][["simulated"]]))), main = "Modularity Null Comparisons", xlab = "Modularity (Q)", ylab = "Density")
abline(v = bin.test[["statistic"]], col = "red", lwd = 2)
In this plot, the red line represents the modularity statistic for the proposed module structure, while the density plot shows values obtained for random networks. As we can see, the proposed module structure is highly non-random, as the observed modularity statistic is greater than any obtained from the random networks (P < 0.05). However, while we only generated 20 null matrices for demonstration purposes, several hundred (or thousand) should be used to obtain a null distribution for formal analyses (this can take a long time with large networks). It is also important to check that null matrices generated by sequential swapping procedures have converged on a random structure. Autocorrelation is avoided by using large numbers of burn-in and thinning steps.
Now, lets visualize the proposed module structure. We’ll start with the site modules. Here, we’ll use ggmap and ggplot2 (supplemented with ggsn to add in a scale bar and north arrow). We’ll extract site modules and merge these results with site coordinates (obtained from the environmental data matrix in Loewen et al. 2020) and plot the results on a terrain basemap.
# Load packages
library(ggmap)
library(ggsn)
library(ggplot2)
# Read the environmental data
phyto.env <- read.csv("Loewen_et_al._2020._Ecol_App._Environmental_data_-_final.csv", header = T)
phyto.env.aug <- subset(phyto.env, Month == "Aug")
rownames(phyto.env.aug) <- phyto.env.aug$Lake.Name
# Extract the site module structure
bin.DIRT.list <- listModuleInformation(bin.DIRT)
bin.DIRT.site.mod.1 <- bin.DIRT.list[[2]][[1]][[1]]
bin.DIRT.site.mod.1.vector <- as.matrix(if_else(rownames(bin.DIRT@moduleWeb) %in% bin.DIRT.site.mod.1, 1, 0))
bin.DIRT.site.mod.2 <- bin.DIRT.list[[2]][[2]][[1]]
bin.DIRT.site.mod.2.vector <- as.matrix(if_else(rownames(bin.DIRT@moduleWeb) %in% bin.DIRT.site.mod.2, 1, 0))
bin.DIRT.site.mod.3 <- bin.DIRT.list[[2]][[3]][[1]]
bin.DIRT.site.mod.3.vector <- as.matrix(if_else(rownames(bin.DIRT@moduleWeb) %in% bin.DIRT.site.mod.3, 1, 0))
bin.DIRT.site.mod.4 <- bin.DIRT.list[[2]][[4]][[1]]
bin.DIRT.site.mod.4.vector <- as.matrix(if_else(rownames(bin.DIRT@moduleWeb) %in% bin.DIRT.site.mod.4, 1, 0))
bin.DIRT.site.mod.5 <- bin.DIRT.list[[2]][[5]][[1]]
bin.DIRT.site.mod.5.vector <- as.matrix(if_else(rownames(bin.DIRT@moduleWeb) %in% bin.DIRT.site.mod.5, 1, 0))
bin.DIRT.site.mod.6 <- bin.DIRT.list[[2]][[6]][[1]]
bin.DIRT.site.mod.6.vector <- as.matrix(if_else(rownames(bin.DIRT@moduleWeb) %in% bin.DIRT.site.mod.6, 1, 0))
# Wrangle site module structure
bin.DIRT.site.mods <- cbind(bin.DIRT.site.mod.1.vector, bin.DIRT.site.mod.2.vector, bin.DIRT.site.mod.3.vector, bin.DIRT.site.mod.4.vector, bin.DIRT.site.mod.5.vector, bin.DIRT.site.mod.6.vector)
rownames(bin.DIRT.site.mods) <- rownames(bin.DIRT@moduleWeb)
bin.DIRT.site.mods <- merge(bin.DIRT.site.mods, bin.DIRT.cz.lower$c, by = "row.names")
rownames(bin.DIRT.site.mods) <- bin.DIRT.site.mods$Row.names
bin.DIRT.site.mods <- bin.DIRT.site.mods[, -1]
colnames(bin.DIRT.site.mods) <- c("Mod01", "Mod02", "Mod03", "Mod04", "Mod05", "Mod06", "c")
coords2var <- cbind(phyto.env.aug$Latitude.N, phyto.env.aug$Longitude.W)
coords2var <- as.data.frame(coords2var, row.names = row.names(phyto.env.aug))
colnames(coords2var) <- c("Latitude", "Longitude")
bin.DIRT.site.mods <- merge(bin.DIRT.site.mods, coords2var, by = "row.names")
rownames(bin.DIRT.site.mods) <- bin.DIRT.site.mods$Row.names
bin.DIRT.site.mods <- bin.DIRT.site.mods[, -1]
bin.DIRT.site.mods['Mods'] <- NA
bin.DIRT.site.mods$Mods[bin.DIRT.site.mods$Mod01 == 1] <- "Mod01"
bin.DIRT.site.mods$Mods[bin.DIRT.site.mods$Mod02 == 1] <- "Mod02"
bin.DIRT.site.mods$Mods[bin.DIRT.site.mods$Mod03 == 1] <- "Mod03"
bin.DIRT.site.mods$Mods[bin.DIRT.site.mods$Mod04 == 1] <- "Mod04"
bin.DIRT.site.mods$Mods[bin.DIRT.site.mods$Mod05 == 1] <- "Mod05"
bin.DIRT.site.mods$Mods[bin.DIRT.site.mods$Mod06 == 1] <- "Mod06"
bin.DIRT.site.mods$Mods <- as.factor(bin.DIRT.site.mods$Mods)
# Plot binary module map
basemap <- get_stamenmap(bbox = c(left = -max(bin.DIRT.site.mods$Longitude) - 2, bottom = min(bin.DIRT.site.mods$Latitude) - 2, right = -min(bin.DIRT.site.mods$Longitude) + 2, top = max(bin.DIRT.site.mods$Latitude) + 2), zoom = 5, maptype = "terrain-background")
bin.site.mod.map <- ggmap(basemap, maprange = TRUE) +
geom_point(data = bin.DIRT.site.mods, aes(-Longitude, Latitude, color = Mods), shape = 16, size = 3) +
geom_point(data = bin.DIRT.site.mods, aes(-Longitude, Latitude), color = "black", shape = 1, size = 3) +
scale_x_continuous("Longitude", breaks = c(-120, -115, -110),
labels = c("-120", "-115", "-110"), limits = c(-120, -110)) +
scale_y_continuous("Latitude", breaks = c(49, 54, 59),
labels = c("49", "54", "59"), limits = c(49, 59.7)) +
labs(color = "Modules", title = "Unweighted site modules") +
scalebar(x.min = attr(basemap, "bb")[[2]],
y.min = attr(basemap, "bb")[[1]],
x.max = attr(basemap, "bb")[[4]],
y.max = attr(basemap, "bb")[[3]],
dist = 100, anchor = c(x = -119.6, y = 49.6), transform = T,
location = "bottomleft", st.size = 3, st.dist = 0.032, dist_unit = "km") +
theme(panel.border = element_rect(colour = "black", fill = NA),
plot.title = element_text(hjust = 0.5),
legend.key = element_rect(fill = "white"))
north2(bin.site.mod.map, x = 0.9, y = 0.9, symbol = 16)
The six modules are distributed across the province, showing no clear spatial pattern, and suggesting a potentially important role of local factors (including water quality and biotic interactions) in driving community differentiation. We can also plot the participation coefficients of each site, which is a measure of among-module connectivity (Guimerà & Amaral 2005). Generally, higher participation coefficients indicate transition zones in spatially structured data.
# Plot binary participation coefficient map
bin.site.pc.map <- ggmap(basemap, maprange = TRUE) +
geom_point(data = bin.DIRT.site.mods, aes(-Longitude, Latitude, color = c), shape = 16, size = 3) +
geom_point(data = bin.DIRT.site.mods, aes(-Longitude, Latitude), color = "black", shape = 1, size = 3) +
scale_x_continuous("Longitude", breaks = c(-120, -115, -110),
labels = c("-120", "-115", "-110"), limits = c(-120, -110)) +
scale_y_continuous("Latitude", breaks = c(49, 54, 59),
labels = c("49", "54", "59"), limits = c(49, 59.7)) +
labs(color = "Participation\ncoefficient", title = "Unweighted site modules") +
scale_colour_gradientn(colours = terrain.colors(10)) +
scalebar(x.min = attr(basemap, "bb")[[2]],
y.min = attr(basemap, "bb")[[1]],
x.max = attr(basemap, "bb")[[4]],
y.max = attr(basemap, "bb")[[3]],
dist = 100, anchor = c(x = -119.6, y = 49.6), transform = T,
location = "bottomleft", st.size = 3, st.dist = 0.032, dist_unit = "km") +
theme(panel.border = element_rect(colour = "black", fill = NA),
plot.title = element_text(hjust = 0.5),
legend.key = element_rect(fill = "white"))
north2(bin.site.pc.map, x = 0.9, y = 0.9, symbol = 16)
Again, we see no clear spatial patterns, reinforcing the suggestion that local factors structure community differentiation at this scale of analysis.
Now lets plot the species modules. We’ll extract the species modules, wrangle them, and produce a hierarchical edge bundling plot using ggraph and igraph. We’ll also bring in some trait data to highlight groups of phytoplankton with specific attributes. Trait data are available from Supporting Information S2 in Loewen et al. (2021b) available at https://doi.org/10.1002/lno.11694.
# Load packages
library(igraph)
library(ggraph)
library(gridExtra)
# Load phytoplankton trait data
phyto.traits <- read.csv("lno11694-sup-0002-supinfo02.csv", header = T)
phyto.traits$SPECIES.NAME <- gsub(" ", ".", phyto.traits$SPECIES.NAME)
phyto.traits$SPECIES.NAME <- gsub("\\.\\.", ".", phyto.traits$SPECIES.NAME)
rownames(phyto.traits) <- phyto.traits$SPECIES.NAME
# Extract the species module structure
bin.DIRT.sp.mod.1 <- bin.DIRT.list[[2]][[1]][[2]]
bin.DIRT.sp.mod.1.vector <- as.matrix(if_else(colnames(bin.DIRT@moduleWeb) %in% bin.DIRT.sp.mod.1, 1, 0))
bin.DIRT.sp.mod.2 <- bin.DIRT.list[[2]][[2]][[2]]
bin.DIRT.sp.mod.2.vector <- as.matrix(if_else(colnames(bin.DIRT@moduleWeb) %in% bin.DIRT.sp.mod.2, 1, 0))
bin.DIRT.sp.mod.3 <- bin.DIRT.list[[2]][[3]][[2]]
bin.DIRT.sp.mod.3.vector <- as.matrix(if_else(colnames(bin.DIRT@moduleWeb) %in% bin.DIRT.sp.mod.3, 1, 0))
bin.DIRT.sp.mod.4 <- bin.DIRT.list[[2]][[4]][[2]]
bin.DIRT.sp.mod.4.vector <- as.matrix(if_else(colnames(bin.DIRT@moduleWeb) %in% bin.DIRT.sp.mod.4, 1, 0))
bin.DIRT.sp.mod.5 <- bin.DIRT.list[[2]][[5]][[2]]
bin.DIRT.sp.mod.5.vector <- as.matrix(if_else(colnames(bin.DIRT@moduleWeb) %in% bin.DIRT.sp.mod.5, 1, 0))
bin.DIRT.sp.mod.6 <- bin.DIRT.list[[2]][[6]][[2]]
bin.DIRT.sp.mod.6.vector <- as.matrix(if_else(colnames(bin.DIRT@moduleWeb) %in% bin.DIRT.sp.mod.6, 1, 0))
# Wrangle species module structure
bin.DIRT.sp.mods <- cbind(bin.DIRT.sp.mod.1.vector, bin.DIRT.sp.mod.2.vector, bin.DIRT.sp.mod.3.vector, bin.DIRT.sp.mod.4.vector, bin.DIRT.sp.mod.5.vector, bin.DIRT.sp.mod.6.vector)
rownames(bin.DIRT.sp.mods) <- colnames(bin.DIRT@moduleWeb)
bin.DIRT.sp.mods <- merge(bin.DIRT.sp.mods, bin.DIRT.cz.higher$c, by = "row.names")
rownames(bin.DIRT.sp.mods) <- bin.DIRT.sp.mods$Row.names
bin.DIRT.sp.mods <- bin.DIRT.sp.mods[, -1]
colnames(bin.DIRT.sp.mods) <- c("Mod01", "Mod02", "Mod03", "Mod04", "Mod05", "Mod06", "c")
bin.DIRT.sp.mods['Mods'] <- NA
bin.DIRT.sp.mods$Mods[bin.DIRT.sp.mods$Mod01 == 1] <- "Mod01"
bin.DIRT.sp.mods$Mods[bin.DIRT.sp.mods$Mod02 == 1] <- "Mod02"
bin.DIRT.sp.mods$Mods[bin.DIRT.sp.mods$Mod03 == 1] <- "Mod03"
bin.DIRT.sp.mods$Mods[bin.DIRT.sp.mods$Mod04 == 1] <- "Mod04"
bin.DIRT.sp.mods$Mods[bin.DIRT.sp.mods$Mod05 == 1] <- "Mod05"
bin.DIRT.sp.mods$Mods[bin.DIRT.sp.mods$Mod06 == 1] <- "Mod06"
bin.DIRT.sp.mods$Mods <- as.factor(bin.DIRT.sp.mods$Mods)
# Create adjacency matrix and wrangle edge lists
bin.sp.adjac <- as.one.mode(phyto.comm.aug.bin, fill = 0, project = "higher", weighted = TRUE)
bin.sp.origins <- data.frame(c("origin", "origin", "origin", "origin", "origin", "origin"),
c("Mod01", "Mod02", "Mod03", "Mod04", "Mod05", "Mod06"))
names(bin.sp.origins) <- c("from", "to")
bin.sp.mods.edges <- as.data.frame(bin.DIRT.sp.mods$Mods)
colnames(bin.sp.mods.edges) <- c("from")
bin.sp.mods.edges$to <- rownames(bin.DIRT.sp.mods)
bin.sp.mods.edges <- bin.sp.mods.edges[order(bin.sp.mods.edges$from), ]
bin.sp.mods.edges <- rbind(bin.sp.origins, bin.sp.mods.edges)
bin.sp.temp.graph <- graph_from_adjacency_matrix(bin.sp.adjac, mode = "undirected", weighted = TRUE, diag = FALSE)
bin.sp.temp.graph.edge.att <- E(bin.sp.temp.graph)$weight
bin.sp.temp.graph.edge.list <- get.edgelist(bin.sp.temp.graph)
bin.sp.mods.connect <- as.data.frame(cbind(bin.sp.temp.graph.edge.list, bin.sp.temp.graph.edge.att))
colnames(bin.sp.mods.connect) <- c("from", "to", "value")
bin.sp.mods.connect$value <- as.numeric(bin.sp.mods.connect$value)
bin.sp.mods.colsums <- as.data.frame(colSums (phyto.comm.aug.bin))
bin.sp.mods.colsums.match <- as.data.frame(bin.sp.mods.colsums$`colSums(phyto.comm.aug.bin)`[match(bin.sp.mods.edges$to, rownames(bin.sp.mods.colsums))])
bin.sp.mods.colsums.match <- rbind(c(NA), bin.sp.mods.colsums.match)
colnames(bin.sp.mods.colsums.match) <- "occur"
# Create data.frame of vertices
bin.sp.mods.vertices <- data.frame(name = unique(c(as.character(bin.sp.mods.edges$from), as.character(bin.sp.mods.edges$to))), value = bin.sp.mods.colsums.match)
bin.sp.mods.vertices$group <- bin.sp.mods.edges$from[match(bin.sp.mods.vertices$name, bin.sp.mods.edges$to)]
# Calculate angle of labels
bin.sp.mods.vertices$id <- NA
bin.sp.mods.myleaves <- which(is.na(match(bin.sp.mods.vertices$name, bin.sp.mods.edges$from)))
bin.sp.mods.nleaves <- length(bin.sp.mods.myleaves)
bin.sp.mods.vertices$id[bin.sp.mods.myleaves]<-seq(1:bin.sp.mods.nleaves)
bin.sp.mods.vertices$angle <- 90 - 360 * (bin.sp.mods.vertices$id - 22) / bin.sp.mods.nleaves
# Calculate the alignment of labels
bin.sp.mods.vertices$hjust <- ifelse((bin.sp.mods.vertices$angle < 90.01 & bin.sp.mods.vertices$angle > -90), 1, 0)
# Flip label angles
bin.sp.mods.vertices$angle <- ifelse(bin.sp.mods.vertices$angle < -90 | bin.sp.mods.vertices$angle > 90, bin.sp.mods.vertices$angle + 180, bin.sp.mods.vertices$angle)
# Add trait data
bin.sp.mods.vertices <- merge(bin.sp.mods.vertices, phyto.traits["SUSPECTED_TOXIN_PRODUCING"], by.x = "name", by.y = 0, all.x = TRUE)
bin.sp.mods.vertices <- merge(bin.sp.mods.vertices, phyto.traits["SILICA_USING"], by.x = "name", by.y = 0, all.x = TRUE)
bin.sp.mods.vertices <- merge(bin.sp.mods.vertices, phyto.traits["CHL_B"], by.x = "name", by.y = 0, all.x = TRUE)
bin.sp.mods.vertices <- merge(bin.sp.mods.vertices, phyto.traits["CHL_C"], by.x = "name", by.y = 0, all.x = TRUE)
# Create a graph object
bin.sp.mods.mygraph <- graph_from_data_frame(bin.sp.mods.edges, vertices = bin.sp.mods.vertices)
# Create connection objects
bin.sp.mods.from <- match(bin.sp.mods.connect$from, bin.sp.mods.vertices$name)
bin.sp.mods.to <- match(bin.sp.mods.connect$to, bin.sp.mods.vertices$name)
# Plot binary hierarchical edge bundling plot
bin.sp.edge.group.plot <- ggraph(bin.sp.mods.mygraph, layout = 'dendrogram', circular = TRUE) +
geom_conn_bundle(data = get_con(from = bin.sp.mods.from, to = bin.sp.mods.to, value = bin.sp.mods.connect$value),
edge_alpha = 0.1, width = 0.3, tension = 0.8, aes(colour = value), show.legend = F) +
scale_edge_color_continuous(low = "cornsilk2", high = "red4") +
geom_node_text(aes(x = x * 1.15, y = y * 1.15, filter = leaf, label = name, angle = angle, hjust = hjust, colour = group),
size = 3, alpha = 0.6) +
geom_node_point(aes(filter = leaf, x = x * 1.07, y = y * 1.07, fill = factor(group), size = occur),
alpha = 0.6, shape = 21, stroke = 0, show.legend = F) +
scale_size_continuous(range = c(0.1, 10)) +
labs(color = "Modules", title = "Unweighted species modules") +
theme_void() +
theme(plot.title = element_text(hjust=0.5),
plot.margin = unit(c(0, 0, 0, 0), "cm")) +
expand_limits(x = c(-1.5, 1.5), y = c(-1.5, 1.5))
bin.sp.edge.group.plot
This plot shows the module structure of phytoplankton species, with different colours indicating community groups. Node size indicates the frequency that each species occurs in the dataset and edges reflect species co-occurrence. Darker edges indicate more instances of co-occurrence; however, patterns may be obscured in cases with large numbers of co-occurring taxa.
We can see that module 6 (in pink) is the most species rich, while module 3 (in green) is the most species poor. We can also observe that module 2 (in gold) has several taxa that occur in a large number of lakes, including Plagioselmis nannoplanctica, Katablepharis ovalis, and species of the genus Chromulinaceae.
We can also visualize the distribution of traits among species modules. For example, we can highlight those taxa that are suspected toxin producers, silica users, or chlorophyll b and c producers.
# Plot identifying suspected toxin producers
bin.sp.edge.toxin.plot <- ggraph(bin.sp.mods.mygraph, layout = 'dendrogram', circular = TRUE) +
geom_conn_bundle(data = get_con(from = bin.sp.mods.from, to = bin.sp.mods.to, value = bin.sp.mods.connect$value),
edge_alpha = 0.1, width = 0.3, tension = 0.8, aes(colour = value)) +
scale_edge_color_continuous(low = "cornsilk2", high = "red4") +
geom_node_text(aes(x = x * 1.15, y = y * 1.15, filter = leaf, label = name, angle = angle,
hjust = hjust, colour = SUSPECTED_TOXIN_PRODUCING), size = 3, alpha = 0.6) +
geom_node_point(aes(filter = leaf, x = x * 1.07, y = y * 1.07, fill = factor(group), size = occur),
alpha = 0.6, shape = 21, stroke = 0) +
scale_size_continuous(range = c(0.1, 10)) +
labs(title = "Suspected toxin producers") +
theme_void() +
theme(legend.position = "none",
plot.title = element_text(hjust=0.5),
plot.margin = unit(c(0, 0, 0, 0), "cm")) +
expand_limits(x = c(-1.5, 1.5), y = c(-1.5, 1.5))
bin.sp.edge.toxin.plot
# Plot identifying species with silica requirement
bin.sp.edge.silica.plot <- ggraph(bin.sp.mods.mygraph, layout = 'dendrogram', circular = TRUE) +
geom_conn_bundle(data = get_con(from = bin.sp.mods.from, to = bin.sp.mods.to, value = bin.sp.mods.connect$value),
edge_alpha = 0.1, width = 0.3, tension = 0.8, aes(colour = value)) +
scale_edge_color_continuous(low = "cornsilk2", high = "red4") +
geom_node_text(aes(x = x * 1.15, y = y * 1.15, filter = leaf, label = name, angle = angle,
hjust = hjust, colour = SILICA_USING), size = 3, alpha = 0.6) +
geom_node_point(aes(filter = leaf, x = x * 1.07, y = y * 1.07, fill = factor(group), size = occur),
alpha = 0.6, shape = 21, stroke = 0) +
scale_size_continuous(range = c(0.1, 10)) +
labs(title = "Silica requirement") +
theme_void() +
theme(legend.position = "none",
plot.title = element_text(hjust=0.5),
plot.margin = unit(c(0, 0, 0, 0), "cm")) +
expand_limits(x = c(-1.5, 1.5), y = c(-1.5, 1.5))
bin.sp.edge.silica.plot
# Plot identifying chlorophyll b pigmented species
bin.sp.edge.chlb.plot <- ggraph(bin.sp.mods.mygraph, layout = 'dendrogram', circular = TRUE) +
geom_conn_bundle(data = get_con(from = bin.sp.mods.from, to = bin.sp.mods.to, value = bin.sp.mods.connect$value),
edge_alpha = 0.1, width = 0.3, tension = 0.8, aes(colour = value)) +
scale_edge_color_continuous(low = "cornsilk2", high = "red4") +
geom_node_text(aes(x = x * 1.15, y = y * 1.15, filter = leaf, label = name, angle = angle,
hjust = hjust, colour = CHL_B), size = 3, alpha = 0.6) +
geom_node_point(aes(filter = leaf, x = x * 1.07, y = y * 1.07, fill = factor(group), size = occur),
alpha = 0.6, shape = 21, stroke = 0) +
scale_size_continuous(range = c(0.1, 10)) +
labs(title = 'Chlorophyll'~italic(b)~'pigmented') +
theme_void() +
theme(legend.position = "none",
plot.title = element_text(hjust=0.5),
plot.margin = unit(c(0, 0, 0, 0), "cm")) +
expand_limits(x = c(-1.5, 1.5), y = c(-1.5, 1.5))
bin.sp.edge.chlb.plot